Merge RedistributeCPU and RedistributeGPU into one implementation by atmyers · Pull Request #5268 · AMReX-Codes/amrex

atmyers · 2026-04-03T21:44:50Z

This merges RedistributeCPU and RedistributeGPU into one shared implementation that works for both, improving the maintainability of the code base. It also restructures the way OpenMP parallelism works in Redistribute on the CPU, resulting in better OpenMP performance and scaling. Another consequence is that particle tiling is now supported on GPU (although probably not desirable in most cases).

Performance results on a redistribute benchmark with 2 MPI ranks on a Perlmutter CPU node as a function of the number of OpenMP threads. "run" is this branch, "dev" is development. The new version is always an improvement and, at high thread counts, is >~ 2x faster.

When compiled for CPU with USE_OMP=FALSE, the new implementation is about 25% faster on that same benchmark, mostly owing to a new early exit in the partition step. The difference is more dramatic is cases with more particles per cell, like this example from WarpX.

Todo:

Add support for tiling to RedistributeGPU
Rename RedistributeGPU to Redistribute_impl and make both CPU and GPU code paths go through it
Add support for OpenMP to Redistribute_impl.
Remove RedistributeCPU
Performance tests
Clean up / finalize
Run full application regtest suite.

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

…method

amrex-gitlab-ci-reporter · 2026-04-15T21:27:22Z

GitLab CI 1529965 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1529965.

atmyers · 2026-04-17T01:51:09Z

/run-hpsf-gitlab-ci

github-actions · 2026-04-17T01:51:19Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1532660.

amrex-gitlab-ci-reporter · 2026-04-17T02:45:20Z

GitLab CI 1532660 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1532660.

atmyers · 2026-04-20T18:17:53Z

Note: the regression test for AMReX apps on gaira, garuda, and biollante look good, so long as these PRs are merged along with this one:
AMReX-Astro/Nyx#109
erf-model/ERF#3123

atmyers · 2026-04-20T18:35:36Z

I also confirmed that the GPU performance has not regressed, despite doing a little bit more work to support tiling:

This PR:

ParticleContainer::Redistribute_impl()          503      2.878       2.95      2.997  58.06%

Dev:

ParticleContainer::RedistributeGPU()            503      2.896      2.939      2.984  57.94%

atmyers · 2026-04-21T22:51:42Z

I have added back in an assertion that tiling is off on the GPU if neighbor particles are used. I will remove this limitation and add a better neighbor particles test in a follow-up PR.

PR #5268 allowed particle tiling on GPU in Redistribute, but did not extend this support to neighbor particles. This PR adds support for this. It also extends to the existing test to support tiling and to more carefully check that the ghosted particle data is exactly right. Note that this test now reproduces all the particles on all ranks and therefore isn't suitable for running on many ranks. The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate

atmyers added 10 commits April 3, 2026 14:44

Add support for tiling in RedistributeGPU

fedc44e

remove unused

936c109

add [[nodiscard]]

60c4106

fix nodiscard

97d1f75

add test with tiling

f469e7b

Make cpu execution go through RedistributeGPU as well and rename the …

0ace095

…method

OpenMP-ize Redistribute_impl

f2fec69

auto -> auto*

39f8bbd

fix particle initialization in ParallelContext test

0da1ea7

fix narrowing

7eb9218

atmyers changed the title ~~[WIP] Add support for tiling in RedistributeGPU~~ [WIP] Merge RedistributeCPU and RedistributeGPU into one implementation Apr 8, 2026

OMP another loop

c2c9e6f

atmyers marked this pull request as draft April 8, 2026 18:25

atmyers added 17 commits April 8, 2026 14:06

don't need to partition when building on CPU

3108286

use one-pass instead of partitioning for the CPU path

30f019e

fix ptd redefinition

3174abf

openmp for cpu path

090f35e

use data duplication strategy for OpenMP

164032c

parallelize inner loop too

f6a0ae3

push openmp up to tiling level for ParticleCopyPlan::build

0d3f089

move omp threading up a level in unpackBuffer

2abd4ae

push omp work up a level in unpackBuffer

7ee2168

Don't need ParallelForOMP anymore

a21b752

remove another ParallelForOMP

cb5a2e8

don't do omp setup work unless needed

a5c63ee

remove timers

e891205

use a helper function rather than a lambda

bdc8584

fix unused

fec8362

remove another unused

c401070

more unused

630dc36

add another short-circuit to speed up the serial algorithm

1d8ba40

atmyers mentioned this pull request Apr 15, 2026

RedistributeCPU with 1 Core 1 Thread 1 Box #4892

Open

atmyers added 6 commits April 15, 2026 16:32

tabs

bb1551c

don't take early exit on non-local redistribute

a8f0e81

Fix the short circuit logic

18bfd6e

tweak check

b98c864

move assignment out of condition

426e741

fix stray )

93329eb

ax3l added the performance label Apr 17, 2026

ax3l requested review from WeiqunZhang, ax3l and dpgrote April 17, 2026 17:49

atmyers mentioned this pull request Apr 20, 2026

Remove the special overload of RedistributeGPU AMReX-Astro/Nyx#109

Merged

Assert that tiling is off for neighbor particles on the GPU

3870c0d

WeiqunZhang approved these changes Apr 21, 2026

View reviewed changes

atmyers added 3 commits April 21, 2026 15:59

reduce duplication

f2b2b94

fix assert

0e8877d

fix assert

06612bf

atmyers merged commit d10a21a into AMReX-Codes:development Apr 22, 2026
72 checks passed

AMLattanzi mentioned this pull request Apr 22, 2026

Fix needed after Redistribute merge erf-model/ERF#3123

Merged

atmyers mentioned this pull request Apr 22, 2026

Support neighbor particle tiling on the GPU #5373

Merged

5 tasks

atmyers mentioned this pull request Apr 27, 2026

Examples: Remove particles.do_tiling BLAST-WarpX/warpx#6809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge RedistributeCPU and RedistributeGPU into one implementation#5268

Merge RedistributeCPU and RedistributeGPU into one implementation#5268
atmyers merged 49 commits intoAMReX-Codes:developmentfrom
atmyers:redist_gpu_tiling

atmyers commented Apr 3, 2026 •

edited

Loading

Uh oh!

amrex-gitlab-ci-reporter Bot commented Apr 15, 2026

Uh oh!

atmyers commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

amrex-gitlab-ci-reporter Bot commented Apr 17, 2026

Uh oh!

atmyers commented Apr 20, 2026

Uh oh!

atmyers commented Apr 20, 2026

Uh oh!

atmyers commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

atmyers commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amrex-gitlab-ci-reporter Bot commented Apr 15, 2026

Uh oh!

atmyers commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

amrex-gitlab-ci-reporter Bot commented Apr 17, 2026

Uh oh!

atmyers commented Apr 20, 2026

Uh oh!

atmyers commented Apr 20, 2026

Uh oh!

atmyers commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

atmyers commented Apr 3, 2026 •

edited

Loading